BioBlend with IPython

BioBlend is a python libary for Galaxy toolkits. It enables high level developments for deploying and deleting cloud instances for Galaxy in python. Furthermore, it supports load and execute workflows on python in a simple format.
BioBlend: https://github.com/afgane/bioblend

An example for using BioBlend:

from bioblend.galaxy import GalaxyInstance
gi = GalaxyInstance('<Galaxy IP>', key='your API key')
libs = gi.libraries.get_libraries()
gi.workflows.show_workflow('workflow ID')
gi.workflows.run_workflow('workflow ID', input_dataset_map)

Run a galaxy workflow through IPyhon Notebook

This tutorial shows how to use the BioBlend library with workflow examples so that python developers can use Galaxy tools on python and Ipython.

Create Galaxy Instance using BioBlend

This step connects to a galaxy server using BioBlend. A server url and an api key are required to connect.
The server url is a location for the installed galaxy server and the api key is an identification of a user.

We have installed a galaxy on a local machine with ipython. So, the galaxy url should be on a local ip address and a default port number 8080, e.g. http://127.0.0.1:8080.
If you want to use a different galaxy server e.g. the public galaxy server hosted by Penn State University, use https://main.g2.bx.psu.edu/.
For the galaxy_api_key, a galaxy user needs to get the string from here: http://[galaxy_server]/user/api_keys?cntrller=user
It is like a password, so only logged in users can obtain the key. Here, I used the key d8699f27a08cc6f42a065e39955b6c47 for my account on the local galaxy server.



In [66]:

    
from bioblend.galaxy import GalaxyInstance

galaxy_url = "http://127.0.0.1:8080"
galaxy_api_key = "d8699f27a08cc6f42a065e39955b6c47"
gi = GalaxyInstance(url=galaxy_url, key=galaxy_api_key)

Test a connection with get_histories()

Once the connection is successfully established, obtaining galaxy histories is a good example to test it is working.
get_histories() returns a list of a current history for a logged in user.



In [69]:

    
hl = gi.histories.get_histories()
hl









    Out[69]:





[{u'deleted': False,
  u'id': u'df7a1f0c02a5b08e',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': True,
  u'tags': [],
  u'url': u'/api/histories/df7a1f0c02a5b08e'},
 {u'deleted': False,
  u'id': u'5969b1f7201f12ae',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/5969b1f7201f12ae'},
 {u'deleted': False,
  u'id': u'a799d38679e985db',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/a799d38679e985db'},
 {u'deleted': False,
  u'id': u'33b43b4e7093c91f',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/33b43b4e7093c91f'},
 {u'deleted': False,
  u'id': u'ebfb8f50c6abde6d',
  u'model_class': u'History',
  u'name': u'Unnamed history',
  u'published': False,
  u'tags': [],
  u'url': u'/api/histories/ebfb8f50c6abde6d'}]

Note.

Galaxy has analytical tools based on Python. Each one of them has an id. For example, CONVERTER_interval_to_bedstrict_0
The JSON Workflow file has section name "tool_id" for the id. e.g. https://gist.github.com/lee212/f1449352334a2268b849

List saved workflows

BioBlend supports basic functions to load and run workflows. get_workflows() returns a list of workflows that a galaxy user has.



In [14]:

    
workflows = gi.workflows.get_workflows()
workflows









    Out[14]:





[{u'id': u'f597429621d6eb2b',
  u'model_class': u'StoredWorkflow',
  u'name': u"Workflow constructed from history 'Unnamed history'",
  u'published': True,
  u'tags': [],
  u'url': u'/api/workflows/f597429621d6eb2b'},
 {u'id': u'1cd8e2f6b131e891',
  u'model_class': u'StoredWorkflow',
  u'name': u'Galaxy 101 (imported from uploaded file)',
  u'published': False,
  u'tags': [],
  u'url': u'/api/workflows/1cd8e2f6b131e891'}]

Retrieve workflow information

There are two workflows stored in the database. Let's select the second workflow named 'Galaxy 101' and see what components it has.
show_workflow() returns detailed information about a workflow such as an id and inputs.



In [72]:

    
workflow = workflows[1]
res = gi.workflows.show_workflow(workflow['id'])
res









    Out[72]:





{u'id': u'1cd8e2f6b131e891',
 u'inputs': {u'29': {u'label': u'Features', u'value': u''},
  u'30': {u'label': u'Exons', u'value': u''}},
 u'model_class': u'StoredWorkflow',
 u'name': u'Galaxy 101 (imported from uploaded file)',
 u'published': False,
 u'steps': {u'24': {u'id': 24,
   u'input_steps': {u'input1': {u'source_step': 30, u'step_output': u'output'},
    u'input2': {u'source_step': 29, u'step_output': u'output'}},
   u'tool_id': u'gops_join_1',
   u'type': u'tool'},
  u'25': {u'id': 25,
   u'input_steps': {u'input': {u'source_step': 26,
     u'step_output': u'out_file1'}},
   u'tool_id': u'sort1',
   u'type': u'tool'},
  u'26': {u'id': 26,
   u'input_steps': {u'input1': {u'source_step': 24,
     u'step_output': u'output'}},
   u'tool_id': u'Grouping1',
   u'type': u'tool'},
  u'27': {u'id': 27,
   u'input_steps': {u'input1': {u'source_step': 30, u'step_output': u'output'},
    u'input2': {u'source_step': 28, u'step_output': u'out_file1'}},
   u'tool_id': u'comp1',
   u'type': u'tool'},
  u'28': {u'id': 28,
   u'input_steps': {u'input': {u'source_step': 25,
     u'step_output': u'out_file1'}},
   u'tool_id': u'Show beginning1',
   u'type': u'tool'},
  u'29': {u'id': 29,
   u'input_steps': {},
   u'tool_id': None,
   u'type': u'data_input'},
  u'30': {u'id': 30,
   u'input_steps': {},
   u'tool_id': None,
   u'type': u'data_input'}},
 u'tags': [],
 u'url': u'/api/workflows/1cd8e2f6b131e891'}

Run workflow

run_workflow() executes the workflow with an input dataset into a selected history.
It returns output dataset IDs which indicate the results of each step in the workflow.



In [49]:

    
dataset_map = {'30':{'id':'cbbbf59e8f08c98c','src':'hda'}, \
                '29': {'id': '964b37715ec9bd22', 'src': 'hda' }}
outputs = gi.workflows.run_workflow(workflow['id'], dataset_map, history_id='df7a1f0c02a5b08e')#history_name='test1withhda')









    Out[49]:





{u'history': u'df7a1f0c02a5b08e',
 u'outputs': [u'6fc9fbb81c497f69',
  u'6fb17d0cc6e8fae5',
  u'5114a2a207b7caff',
  u'06ec17aefa2d49dd',
  u'b8a0d6158b9961df']}

There are two input datasets used and one of them is 'UCSC Main on Human: knownGene (chr22:1-51304566)'.
Its id 'cbbbf59e8f08c98c' displays detailed information for the input dataset.



In [56]:

    
dataset = gi.datasets.show_dataset('cbbbf59e8f08c98c')
dataset









    Out[56]:





{u'accessible': True,
 u'api_type': u'file',
 u'data_type': u'bed',
 u'deleted': False,
 u'display_apps': [{u'label': u'display in IGB',
   u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/igb_bed/Local',
     u'target': u'_blank',
     u'text': u'Local'},
    {u'href': u'/display_application/cbbbf59e8f08c98c/igb_bed/Web',
     u'target': u'_blank',
     u'text': u'Web'}]},
  {u'label': u'display at Ensembl',
   u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/ensembl_interval/ensembl_Current',
     u'target': u'_blank',
     u'text': u'Current'}]},
  {u'label': u'display at RViewer',
   u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/rviewer_interval/lbl_main',
     u'target': u'_blank',
     u'text': u'main'}]}],
 u'display_types': [],
 u'download_url': u'/api/histories/df7a1f0c02a5b08e/contents/cbbbf59e8f08c98c/display',
 u'file_ext': u'bed',
 u'file_size': 797714,
 u'genome_build': u'hg19',
 u'hda_ldda': u'hda',
 u'hid': 1,
 u'history_id': u'df7a1f0c02a5b08e',
 u'id': u'cbbbf59e8f08c98c',
 u'metadata_chromCol': 1,
 u'metadata_column_names': None,
 u'metadata_column_types': [u'str', u'int', u'int', u'str', u'int', u'str'],
 u'metadata_columns': 6,
 u'metadata_comment_lines': None,
 u'metadata_data_lines': 12410,
 u'metadata_dbkey': u'hg19',
 u'metadata_endCol': 3,
 u'metadata_nameCol': 4,
 u'metadata_startCol': 2,
 u'metadata_strandCol': 6,
 u'metadata_viz_filter_cols': [4],
 u'misc_blurb': u'12,410 regions',
 u'misc_info': u'',
 u'model_class': u'HistoryDatasetAssociation',
 u'name': u'UCSC Main on Human: knownGene (chr22:1-51304566)',
 u'peek': u'<table cellspacing="0" cellpadding="3"><tr><th>1.Chrom</th><th>2.Start</th><th>3.End</th><th>4.Name</th><th>5</th><th>6.Strand</th></tr><tr><td>chr22</td><td>16258185</td><td>16258303</td><td>uc002zlh.1_cds_1_0_chr22_16258186_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16266928</td><td>16267095</td><td>uc002zlh.1_cds_2_0_chr22_16266929_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16268136</td><td>16268181</td><td>uc002zlh.1_cds_3_0_chr22_16268137_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16269872</td><td>16269943</td><td>uc002zlh.1_cds_4_0_chr22_16269873_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16275206</td><td>16275277</td><td>uc002zlh.1_cds_5_0_chr22_16275207_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16277747</td><td>16277885</td><td>uc002zlh.1_cds_6_0_chr22_16277748_r</td><td>0</td><td>-</td></tr></table>',
 u'purged': False,
 u'state': u'ok',
 u'uuid': None,
 u'visible': True,
 u'visualizations': [u'trackster', u'circster', u'scatterplot']}

Display outputs in IPython HTML



In [89]:

    
from IPython.core.display import HTML



In [116]:

    
merged_htmls = ""

for output in outputs['outputs']:
    dataset = gi.datasets.show_dataset(output)
    #pprint.pprint(dataset)
    name = dataset['name']
    html = dataset['peek']
    merged_htmls += "<p><b>%s</b>" % name + html + "</p>"
    
HTML(merged_htmls)









    Out[116]:




Join on data 2 and data 1
1.Chrom 2.Start 3.End 4.Name 5 6.Strand 7 8 9 10 11 12
chr22 16258185 16258303 uc002zlh.1_cds_1_0_chr22_16258186_r 0 - chr22 16258278 16258279 rs2845178 0 +
chr22 16266928 16267095 uc002zlh.1_cds_2_0_chr22_16266929_r 0 - chr22 16267031 16267032 rs7292200 0 +
chr22 16266928 16267095 uc002zlh.1_cds_2_0_chr22_16266929_r 0 - chr22 16266963 16266964 rs10154680 0 +
chr22 16266928 16267095 uc002zlh.1_cds_2_0_chr22_16266929_r 0 - chr22 16267011 16267012 rs7290262 0 +
chr22 16266928 16267095 uc002zlh.1_cds_2_0_chr22_16266929_r 0 - chr22 16267037 16267038 rs2818572 0 +
chr22 16269872 16269943 uc002zlh.1_cds_4_0_chr22_16269873_r 0 - chr22 16269933 16269934 rs2845206 0 +
Group on data 8
1 2
uc002zlh.1_cds_1_0_chr22_16258186_r 1
uc002zlh.1_cds_2_0_chr22_16266929_r 4
uc002zlh.1_cds_4_0_chr22_16269873_r 1
uc002zlh.1_cds_5_0_chr22_16275207_r 2
uc002zlh.1_cds_6_0_chr22_16277748_r 5
uc002zlh.1_cds_7_0_chr22_16279195_r 2
Sort on data 9
1 2
uc010gsw.2_cds_1_0_chr22_21480537_r 67
uc021wmb.1_cds_0_0_chr22_21480537_r 67
uc002zoc.3_cds_0_0_chr22_18834445_f 58
uc021wnd.1_cds_0_0_chr22_24647973_f 50
uc021wmc.1_cds_0_0_chr22_21637809_f 47
uc003bhh.3_cds_0_0_chr22_46652458_r 46
Select first on data 10
1 2
uc010gsw.2_cds_1_0_chr22_21480537_r 67
uc021wmb.1_cds_0_0_chr22_21480537_r 67
uc002zoc.3_cds_0_0_chr22_18834445_f 58
uc021wnd.1_cds_0_0_chr22_24647973_f 50
uc021wmc.1_cds_0_0_chr22_21637809_f 47
top 5 exons
1.Chrom 2.Start 3.End 4.Name 5 6.Strand
chr22 18834444 18835833 uc002zoc.3_cds_0_0_chr22_18834445_f 0 +
chr22 21480536 21481925 uc010gsw.2_cds_1_0_chr22_21480537_r 0 -
chr22 21480536 21481925 uc021wmb.1_cds_0_0_chr22_21480537_r 0 -
chr22 21637808 21638558 uc021wmc.1_cds_0_0_chr22_21637809_f 0 +
chr22 24647972 24649256 uc021wnd.1_cds_0_0_chr22_24647973_f 0 +

[comment]: <> (<!---

Plans

create pipelines and workflows in python
- create functions by wrapping python scripts with parameters
  e.g. join(a,b,c) <= join.py a=1 b=2 c=3
- how to handle outputs?
display workflows in ipython?
- html for javascript and iframe
- can we do like matplotlib? --> )We successfully executed the workflow on Python with BioBlend and displayed the results on IPython.

1.Chrom	2.Start	3.End	4.Name	6.Strand	7	8	9	10	12
chr22	16258185	16258303	uc002zlh.1_cds_1_0_chr22_16258186_r	-	chr22	16258278	16258279	rs2845178	+
chr22	16266928	16267095	uc002zlh.1_cds_2_0_chr22_16266929_r	-	chr22	16267031	16267032	rs7292200	+
chr22	16266928	16267095	uc002zlh.1_cds_2_0_chr22_16266929_r	-	chr22	16266963	16266964	rs10154680	+
chr22	16266928	16267095	uc002zlh.1_cds_2_0_chr22_16266929_r	-	chr22	16267011	16267012	rs7290262	+
chr22	16266928	16267095	uc002zlh.1_cds_2_0_chr22_16266929_r	-	chr22	16267037	16267038	rs2818572	+
chr22	16269872	16269943	uc002zlh.1_cds_4_0_chr22_16269873_r	-	chr22	16269933	16269934	rs2845206	+

1	2
uc010gsw.2_cds_1_0_chr22_21480537_r	67
uc021wmb.1_cds_0_0_chr22_21480537_r	67
uc002zoc.3_cds_0_0_chr22_18834445_f	58
uc021wnd.1_cds_0_0_chr22_24647973_f	50
uc021wmc.1_cds_0_0_chr22_21637809_f	47
uc003bhh.3_cds_0_0_chr22_46652458_r	46

1.Chrom	2.Start	3.End	4.Name	6.Strand
chr22	18834444	18835833	uc002zoc.3_cds_0_0_chr22_18834445_f	+
chr22	21480536	21481925	uc010gsw.2_cds_1_0_chr22_21480537_r	-
chr22	21480536	21481925	uc021wmb.1_cds_0_0_chr22_21480537_r	-
chr22	21637808	21638558	uc021wmc.1_cds_0_0_chr22_21637809_f	+
chr22	24647972	24649256	uc021wnd.1_cds_0_0_chr22_24647973_f	+